Too Little for Big Data?

نویسندگان

  • Jane Huffman Hayes
  • Giulio Antoniol
  • Licong Cui
  • Tingting Yu
چکیده

Context and motivation. Trace matrices are the lynchpin of verification and validation activities that must be performed for missionand safety-critical software systems: criticality analysis, completeness analysis, change impact analysis, etc. Studies have shown that automated traceability techniques can achieve high recall and sometimes acceptable precision when used to generate trace matrices [1]. The human analyst is required in the loop for many critical software systems and plays a role in vetting the auto-generated trace matrices. Studies have shown that humans are fallible and tend to decrease the accuracy of auto-generated trace matrices [2, 3, 4]. To address the need for improved matrix quality and synergy with analysts, researchers are examining methods that have received popular and high acclaim. We surmise that “big data,” deep learning, and meta-heuristic search are three categories of interest. Big data refers to “an emerging data science paradigm of multi-dimensional information mining for scientific discovery and business analytics over large-scale infrastructure” [5]. In addition, when facing complex classification problems, deep learning [7] has proven to be effective [7,8]. However, not all data have been created equal, and some data are likely more important than others [11]; unfortunately, exhaustive search is oftentimes not feasible and we must resort to heuristic methods [10]. Problem statement. Automated trace link generation techniques suffer from low precision and lack of synergy with human analysts. There is a potential that big data technologies, deep learning, and heuristic optimization can assist with automated trace link generation due to enormous software artifacts data and its complex structures. Ideas and results. We plan to characterize trace generation in terms of an unbalanced big data classification problem. For example, we will examine the typical size of software engineering artifacts, software elements that comprise the artifacts, diversity of the datasets, granularity of the datasets, and align them with big data technique pre-requisites/requirements. Though it may appear that traceability datasets are not large enough to apply big data techniques, with software engineering artifacts generally consisting of thousands of elements versus millions or billions, we can borrow semantically reach words encoding from natural language processing techniques [8]. Possibilities for addressing this include increased granularity in order to expand the size of datasets, deriving more data elements featuring disparate aspects of the datasets, etc. However, new ideas are needed to properly handle the challenge of highly unbalanced datasets where only a handful of true links exist. We expect that expanding the size of the data could simplify the rebalancing problem and improve the accuracy of trace generation. A second possibility would be to formulate the classifier-rebalancing problem as a search problem [10] or to model the trace recovery as a classification task where deep learning techniques place true traces close in the feature space making the similarity between true links higher. Alternatively, we may use big data representations for trace elements such as directed acyclic graphs and perform concept mining over the graphs [6]. Contributions and future directions. Inspired by prior work [10,11,12], we plan to capitalize on big data approaches successfully applied to biomedical problems [6], heuristic optimization [10], and deep learning [7,8,12] and learn how to apply them to the trace link generation problem. Keywords-big data; trace link generation; traceability

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leveraging Big Data for Competitive Advantage

Big data means data sets which are too large, too unstructured and too fast changing, to use traditional data management methods. Enterprises that want to collect and process this data need new solutions for data processing and analysis. The aim of this paper is to identify the potential of big data analytics (BDA) as a source of competitive advantage of manufacturing companies in the market. T...

متن کامل

Was there too little entry during the Dot Com Era?

We present four stylized facts about the Dot Com Era: (1) there was a widespread belief in a “Get Big Fast” business strategy; (2) the increase and decrease in public and private equity investment was most prominent in the internet and information technology sectors; (3) the survival rate of dot com firms is on par or higher than other emerging industries; and (4) firm survival is independent o...

متن کامل

Big Data Analytics with Hadoop to analyze Targeted Attacks on Enterprise Data

Big Data describes data sets that are too large, to unstructured or too fast changing for analysis. Big Data analytics is the process of analyzing and mining Big Data. Due to increase in number of sophisticated targeted threats and rapid growth in data, the analysis of data becomes too difficult. Today's Big Data security analytics systems rely, on untrustworthy data. As organizations open and ...

متن کامل

The Big Idea: Before You Make That Big Decision... - Harvard Business Review

Thanks to a slew of popular new books, many executives today realize how biases can distort reasoning in business. Confirmation bias, for instance, leads people to ignore evidence that contradicts their preconceived notions. Anchoring causes them to weigh one piece of information too heavily in making decisions; loss aversion makes them too cautious. In our experience, however, awareness of the...

متن کامل

Was There Too Little Entry during the Dot Com Era Was There Too Little Entry during The

We present four stylized facts about the Dot Com Era: (1) there was a widespread belief in a Get Big Fast business strategy; (2) the increase and decrease in public and private equity investment was most prominent in the internet and information technology sectors; (3) the survival rate of dot com rms is on par or higher than other emerging industries; and (4) rm survival is independent of priv...

متن کامل

Exploring Culture-Related Content in English Textbooks: A Closer Look at Advanced Series of Iran Language Institute

The aim of this article was to examine three advanced textbooks in Iran Language Institute (ILI) in an attempt to establish if they differ in the extent to which they represent dimension of big ‘C’ culture and little ‘c’ culture, their stance in distribution of references of cultural category, and also what themes predominate. The analysis identifies just the cultural elements, and culture–free...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017